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Abstract 

Let Hq be a selfadjoint operator such that Tre" of trace class for some (3 < 1, 

and let Af e denote the set of e-bounded forms, i.e \\(H +C)- 1 / 2 - € X(H +C)- 1 / 2+€ \\ < C 
for some C > 0. Let X := Span U eE (o,i/2] Let M. denote the underlying set of the 
quantum information manifold of states of the form p x = e~ Ha ~ x ~^x , X € X. We 
show that if Tre~ H o = 1, 

1. the map <&, 

$(X) = ^Tr (e- Ha+x + e- H "- x ) - 1 

is a quantum Young function defined on X 

2. The Orlicz space defined by $ is the tangent space of M at p ; its affine structure 
is defined by the (+l)-connection of Amari 

3. The subset of a 'hood of p , consisting of p-nearby states (those a S M. obeying 
C _1 p^ +P < a < Cp x ~ p for some C > 1) admits a flat affine connection known as 
the (—1) connection, and the span of this set is part of the cotangent space of M 

4. These dual structures extend to the completions in the Luxemburg norms. 



1 Introduction 

1.1 The need for an Orlicz topology 

Let 7i be a separable Hilbert space of infinite dimension, and B{Ti) the W*-algebra of all 
bounded operators on 7i. The set E of all normal states can be furnished with the trace 
norm. However, the associated metric is not a good measure of the distance between states. 
For example, if p € S has finite entropy: 

S(p ) ■= Trp logp < oo, (1) 

any trace-norm 'hood of p € S contains a dense set of states of infinite entropy. These 
states cannot be near p in any physical sense. Moreover, if {p(t),t > 0} is the dynamics of 
a system, then we expect that S(p(t)\p(0)) < oo for all t > 0, where 

5(a|p):=Trp(logp-loga) (2) 

is the relative entropy of Umegaki. This, which is related to the free energy, should be finite 
for physical states. We need a stronger topology, such that a 'hood of p contains only 



states p for which S(p\p ) < oo. In this paper we show that this goal can be achieved by a 
norm topology, by developing an analogue of the L log L space: the unwanted states near 
a given state p are outside the space of finite norm. The norm is a limiting case of the 
Schatten cross norms on spaces of compact operators, which can be regarded as quantum 
version of Orlicz spaces. Orlicz spaces were first introduced into information geometry in 
the classical case by Pistone and Sempi |To] . 

1.2 The work of Pistone and Sempi 

These authors are statisticians, who developed a theory of best estimators (of minimum 
variance) among all locally unbiased estimators in non-parametric estimation for classical 
statistical theory. 

Let (X, p) be a measure space, and / the density of a probability measure equivalent to 
p. Thus, 

f(x) > p almost everywhere, and E^[/] := / f(x)p(dx) = 1. (3) 

J X 

Let f be such a density; we seek a useful family of sets TV" containing /„, designed to exclude 
the states of infinite entropy, and which can be taken to define neighbourhoods of f . We 
then do the same for each point of N, and so on, thus constructing a topological space. 
Consider the class of measures of the form 



/ = f ew{u - V>/oW} 



(4) 



in which ip, called the free energy, is finite, for all states of a one-parameter exponential 
family: 



ip fo (\u) :=logE /o/ 



< oo for all A G [— e, e]. 



(5) 



This implies that all moments of u exist in the probability measure v = pf , and the moment 
generating function is analytic in a 'hood of A = 0. The random variables u satisfying © 
for some e are said to lie in the Cramer class, after the statistician Harald Cramer. 

The Cramer class of random variables was shown by Pistone and Sempi to be a Banach 
space, and so, to be complete, when furnished with the Luxemburg norm 



u 



inf < r > : E M /„ cosh - — 1 



The map 



u ^ exp {u - ip fo (u)} f := f (u) 



(6) 



(7) 



maps the unit ball in the Cramer class into the class of probability distributions that are 
absolutely continuous relative to p. The identification of tp as the free energy can be seen 
when we write f = exp{— h }, so that / = exp{— h — u — ipf(u)}; then h appears as 
the 'free Hamiltonian', and u as the perturbing potential, of the Gibbs state pf. Random 
variables u and v differing by a constant give rise to the same distribution. The map 
( |7| ) becomes bijective if we fix u such that E^[/ n] = 0, that is, u has zero mean in the 
measure f p. Such a u is called a score in statistics. The corresponding family of measures 
pf (Xu), for A G [— e, e], is called a one-parameter exponential family. Pistone and Sempi 
define the neighbourhood N of f to consist of all distributions in some exponential family, 



as u runs over the Cramer class. They add similar 'hoods for each / € N, and show 
that the Luxemburg norms are equivalent on overlapping 'hoods. They thus construct the 
information manifold A4, which is modelled on the Banach space of functions of Cramer 
class; this Banach space is identified with the tangent space at any / £ A4. The manifold 
is furnished with a Riemannian metric, the Fisher metric, which at / G A4 is the second 
Frechet differential of tfjf(u). The Cramer class is a special case of an Orlicz space; we now 
review this. 

1.3 Young functions and Orlicz spaces 

A Young function is a convex map <I> : R —* R + U oo such that 

1. = $(— x) 

2. $(0) = 

3. lirn^oo 3>(x) = +oo. 

The epigraph of is the set of points {(x,y) : y > $(x)}; it is closed and convex. Then $ 
is lower semicintinuous, and A i— » <J>(AX) is continuous on any open set on which it is finite 
0- 

Examples 



3>i(x) := coshx — 1 

<3? 2 (x) := e' a '' — |ar| — 1 

<3?3(x) := (1 + \x\) log(l + \x\) — \x\ 

<3? p (x) := \x\ p defined for 1 < p < oo. 

Let $ be a Young function; then its Legendre-Fenchel dual, 

$*(y) := sup{xy - ${x)} (8) 

is also a Young function. From Legendre theory, we see that $** = <£. For example, 
$ 2 = $3, and 3> p = if p" 1 + g" 1 = 1. 

Equivalence. 

We say that two Young functions <1? and are equivalent if there exists < c < C < oo 
and xq > such that 

$(cs) < < $(Ca?) (9) 

for all x > Xq. We then write $ = The scale of x is not relevant. Duality is an operation 
on the equivalence classes: 

$ EE * =^> EE (10) 

For example, $i = $2. 
The A2-class. 

We say that a Young function satisfies the A2-condition iff there exists n > and xo > 
such that 

<E>(2x) < k$(x) for all x > x . (11) 
For example, 3> p and $3 satisfy A2, but not $1 or $2- 



The Orlicz class and the Orlicz space 

Let (O, B, v) be a measure space obeying some mild conditions, and let $ be a Young func- 
tion. The Orlicz class defined by (fi, B, v) and <3? is the set L?(y) of real- valued measurable 
functions aotiQ obeying 

<&{u{x))v{dx) < oo. (12) 



It is a convex space of random variables, and is a vector space iff $ € A2. The Orlicz space 
associated with <3? and v is 

It is a vector space of random variables, and is the span of the Orlicz class. Up to sets of 
measure zero, L* is a Banach space when furnished with the Orlicz norm 

: = sup jy \uv\du : v G L*' , j $*(v(x))du < lj , (14) 

or with the equivalent gauge norm, also known as a Luxemburg norm, for any a > 0: 

IMU,a : = inf |r > : J ®{r~ l u(x))v(dxy> < a}. (15) 

By the Luxemburg norm, denoted \\u\\ L we shall mean the case when a = 1. Equivalent 
Young functions give equivalent norms, and is separable iff $ € A2. 



Analogue of Holder's inequality 
We have the inequality 



This leads to 



\uv \v{dx) < 2||it|| J . \\v\\ L . (16) 



Examples. For $ p (x) := |x| p , the Orlicz space is the Lebesgue space L p , and the dual Orlicz 
space is L q , where p^ 1 + q^ 1 = 1. For $1 we get a non-separable space, sometimes called 
the Zygmund space when Q = R. It is the dual of L* 3 , also known as the L log L space of 
distributions of finite differential entropy. 
See ^3 IH] for classical Orlicz theory. 

Squeezing in logarithms 

When v is discrete with countable support, the Orlicz spaces associated with $ p are the 
p-summable sequences £ p , 1 < p < 00. These form a nested family of Banach spaces, 
with i 1 the smallest and i°° the largest. However, this is not the best way to look at 
Orlicz spaces. Legendre transforms come into their own in the context of a manifold, as a 
transform between the tangent space and the cotangent space at each point. There is only 
one manifold, but many coordinatizations. For the information manifold A4 of Pistone and 
Sempi, the points of the manifold are the probability measures v equivalent to fi, and these 
form a positive cone inside L 1 (fi, fi). This cone can be coordinatized by the Radon-Nikodym 
derivatives / = dvjd\i. The linear structure of L 1 (Q,d/i) provides the tangent space with 



an affine structure, which is called the (-l)-affine structure in Amari's notation. Amari has 
suggested that we may also use the coordinates 

ia(f) := f (1 - a)/2 -Ka<l, (17) 

known as the Amari embeddings of the manifold into L p , where p = 2(1 — a)" 1 , (since 
/ £ i 1 , we have u = /( 1_Q )/ 2 g LP). Thus, the Orlicz spaces of all the Young functions 
\u\ p give the same topology on the manifold, namely, that of L . So they do not help in 
eliminating states of infinite relative entropy. These coordinates do provide us with an 
interesting family of connections, V Q := d/d£ a , which define the Amari affine structures. 

We do better with the formal limit as p — > oo. In the discrete case, the relative entropy 
is the limit as a — > 1 of the Hasegawa-Petz a-entropies 

S(g\f) := £/(x)(log/(x)-log 5 (s)) (18) 

X 

= Y.^ l ~ arl ( f{x) ~ f{xra{x)l ~ a )- (19) 

X 

It turns out that S a (f\g) is the 'divergence' of the Fisher metric along the a-geodesics. 
The relative entropy S(g\f) arises as the divergence along the geodesies provided by the 
embedding 

h(f) :=log/. 

Thus the affine structure corresponds to the linear structure of the random variables u where 
/ = foe u , as in the theory of Pistone and Sempi. The topology given by the corresponding 
Young function $3 is not equivalent to that of L , but gives rise to the smaller space L log L, 
as wanted. 

Is there a quantum analogue to this theory? 



2 The quantum information manifold 

2.1 The underlying set of the manifold 

Let Tt be a separable Hilbert space, with BiTL) denoting the algebra of bounded operators 
on Ti, and denote by S + the set of faithful normal states on B(TL). In 17 it was suggested 
that the quantum information manifold M. in infinite dimensions should consist of p £ S + 
with the property that there exists (3q £ [0, 1) such that p@ is of trace class for all > [3q. 
That is, states p £ M are a bit smoother than general density operators, in that some 
fractional power of p is also of trace class. This condition is satisfied by the temperature 
states of the harmonic oscillator (for which /?o = 0) and most elementary systems, as well 
as quantum field theory, in a box with periodic boundary conditions. Thus, for given f3, the 
state must lie in the class Cp of Schatten, in the unfashionable case (3 < 1; this is a complete 

metrisable space of compact operator furnished with the quasi-norm p 1— > 
In we took the underlying set of the quantum infomanifold to be 

M := (J Cp n £+. (20) 

0</3<l 

All these states have finite von Neumann entropy. In limiting the theory to faithful states, 
we are imitating the decision of Pistone and Sempi that the probability measures of the 



information manifold should be equivalent to the guiding measure fj,, rather than say, merely 
absolutely continuous; here, the trace is the quantum analogue of the measure p. Given a 
point p G M., we seek an analogue of the Cramer class at p . 

2.2 The quantum Cramer class 

Let us write an arbitrary state p G A4 as 

Po = exp{-# - ->P }- (21) 

This is always possible, since p is faithful. The choice of Hq is ambiguous up to a multiple 
of the identity, since this can be absorbed into ijj, defined by 

ip = log TV exp{-# } = log Z . 

Thus there is no loss in generality by taking Zq = 1. 

We perturb a given state p by adding a potential to Hq, in analogy with the classical 

theory, where the potential is u as in Suppose that X is a quadratic form such that 

1/2 

DomX 5 DomH and there exist positive a, b such that 

|X(p }¥ >)| < a{Hl /2 ^Hl' 2 ) + b\M 2 (22) 

1/2 

for all (p G Domf^Q . Then we say that X is form-bounded relative to Hq. The infimum of 
all a satisfying ()22D is called the HQ-ioiTa. bound of X; we shall denote the form bound by 
H^ll^ , in honour of T. Kato. It is a semi-norm on the linear set of forms bounded relative 
to Hq. It is well known that if ||^|| K < 1, then Hq + X defines a semibounded self-adjoint 
operator. More, if ||-X"|| K is small enough, less that a(f3 ) say, depending on (3 , then [T7] . 
we have 

p x ■= e -H -x-i, x e M (23) 

To prove that px is of trace class, write — Hq — X = —(3Hq — (1 — (3)Hq — X; then by the 
Golden-Thompson inequality, taking f3 < j3 < 1, 

Trp x < 

< 
< 

More is true Px inherits the properties of p with a new f3 nearer 1, and lies in Ai. 
In [SI, we added a further condition on the quadratic form, called e-boundedness: 

Definition 1 For any e G (0,1/2] we say that a quadratic form X is e-bounded by H if 
there exists a constant C such that 

(H + C)- l l 2 - e X{H Q + C)- l l 2+e < CI. 



The set of states satisfying (^Q) is obviously (+l)-convex; that is, if X% and X<i satisfy 
then so does \X\ + (1 — A)X2. We showed [B] that the free energy is an analytic function 
of the perturbation parameter in a small 'hood of zero. This, then, is an analogue of the 
Cramer condition. Here, we use this condition to specify the tangent space of M at p . 



Tre 



-PHo e -(i-p)H -X 



Millie 



-(1-P)H -X\ 



o 
oo. 



For the cotangent space, we replace (0) by a (possibly) different set, of states p x that are 
p-nearby p , defined and used in jTS]: for some C > 1, and p G (0,1), 



both p-nearby po, then so is \p\ + (1 — X)p2- It is not known whether it is (+l)-convex 
unless p = 0. It is not hard to show that (|24|) implies that for small enough p, p x £ A4 |19j . 
It is easy to show that the intersection of the sets 



contains the set of states with finite Araki norm ^Hj; this set carries both affine structures. 

Our strategy in furnishing A4 with a topology is a quantum version of |15j . We 
parametrise states near p Q by the potential X, and can adjust X and ip x so that the 
generalised mean p ■ X of X in the state p , proved to be finite in |17j , is zero: 



These are the quantum scores. The (+1) affine structure on forms satisfying gives, by 
transfer of structure, an affine structure to the corresponding part of M. Thus we get a piece 
N of a flat manifold modelled on a vector space. When furnished with the e-norms, with 
any point of N replacing p , the norms on overlapping 'hoods are equivalent. We thus get a 
Banach manifold. It has the interesting property that there are no global linear coordinates, 
even though the coordinate patches are linear with linear transition functions. To see this, 
consider perturbations of the form X = {k — l)-ffo> which is HQ-sm.aH\ enough if k is close to 
1. We cannot use the perturbation if k = 0, as then X = —Hq, and exp{— (Hq + X)} = I, 
which is not of trace class. Roughly speaking, the manifold is a convex cone pointing in the 
general direction of Hq. This suggests that the correct Orlicz space must fail to satisfy the 
(technical) A2-condition. The Orlicz class at p , which is always convex but might not itself 
be linear, should allow only perturbations X with sufficiently small norm. Then the Orlicz 
space, the linear span of the Orlicz class, parametrises the tangent space of M. at po but 
the scores will not provide a valid parametrization of the whole manifold. Our suggested 
Young function, below, shows these features. 

2.3 The category of partially ordered Riemannian manifolds 

Amari has posed the question |2j, what properties, extra to being a Riemannian manifold, 
characterise information manifolds? Obviously, such a manifold must possess the Amari 
family of affine connections, {V Q }, with V a dual to V_ Q relative to the metric. One could 
ask the same question for quantum information manifolds. These affine connections are 
associated with the embeddings (|17|) . which can be extended to weights (non-normalised 
probabilities) and have quantum versions 




[J {p x : X is e bounded} Q [J {states p-nearby p } 

-e>0 J LpG(0,1) 




(25) 



£ a (p) := p 



,2/(l-a) 



-1 < a < 1. 



(26) 



The quantum versions of the limit cases, a — > ±1, are 



i + (p) := log/3 and l^(p) := p. 



(27) 



It is a fact that all these maps are operator monotone; they preserve the partial order 
between operators. We say A > B ii A — B is a positive (semidefinite) operator. Let us say 
that a coordinate system p i— > £(p) for the set of weights is a monotone coordinate system if 
this partial ordering is preserved. An allowed coordinate system for the quantum case must 
be monotone, and a morphism between two information manifolds must involve monotone 
maps. This differs from Chentsov's definition of morphism; it allows non-linear changes 
of coordinates, which transform one monotone metric to another ^Hl- This suggests the 
following definition: 

Definition 2 Let A4\ and be Riemannian manifolds with partial orders <i and <2- 
A map T : A4± — > M2 is called a morphism of partially ordered Riemannian manifolds if T 
is a morphism of Riemannian manifolds and maps any two comparable points of M\ into 
comparable points of M2, and the order is preserved. 

This defines the category of partially ordered Riemannian manifolds. For example, in finite 
dimensions consider the set of faithful weights in M n furnished with the BKM metric, 
g B . Then an operator monotone bijection on this set transforms g B to another monotone 
metric, g. According to 10 , by this means we can get any of the monotone metrics as 
classified by Petz Thus all the models are isomorphic when regarded as partially 

ordered Riemannian manifolds. 

2.4 The quantum Young function 

In some recent work [Q on quantum Orlicz spaces, use was made of classical Young functions. 
Thus, if A is a self-adjoint operator, and $ is a Young function, one can take as the quantum 
Young function the map A 1— > Tr$(|A|), where A is the rearranged operator. For the cases 
— 1 < a < 1, this gives us back the trace-norm topology, as explained in the classical case 
above, when we put A = p. In the limit case £+, we encounter cosh A, which does not 
make sense for forms, and also does not correspond the perturbation by a potential. In the 
classical case, foe~~ u = e~ ho ~ u , but in the quantum case, p e~ x is not hermitian, even if 
A is a bounded operator, unless [.Hq,A] = 0. The Young function $1 = coshx — 1 gets 
multiplied by /o in the classical theory {c.f. ©), but the quantum analogue of this would be 
Tr (p (cosh |A| — 1)) which is not positive. We therefore take a different choice of ordering 
for the non-commuting variables, and suggest JZj that the quantum Young function at p 
should be 



If Hq commutes with A € B(7i), this reduces to Tr p &i(X). Since this already includes 
the factor p , we must omit this factor in the analogues of © and the rest. For p-nearby 
states, we can take the analysis of |17| further: 

Theorem 3 Suppose that p x is p-nearby p , for some p < 1 — (3 X . Then BKM regularised 
metric 



is well-defined. 

PROOF: Since X = H x — Hq, it is enough to consider the case where each A is replaced 
by Hx, as the remaining terms involve Hq and are easily bounded. We suppose that 




(28) 




(29) 



Po — CPx P ! since x i— > x a is operator monotone for < a < 1, we see that 

— a(l— p) — p) 

is a bounded operator; the same goes if a is replaced by 1 — a. We write the integrand as 
the product 

n arr 1-a _ a(l-p)/2 ( -a(l-p)/2 a -a(l-p)/2\ a(l-p)/2 x 
Po lx xP P x \Px PqPx J Px ^ 

p (l-p)(l-a)/2 r-(i-p)(l- a )/2 p l-a p -(l-p)(l-a)/2\ p -(l- p )(l- a )/2 jj x 



of which the trace (by Holder's inequality for traces) is bounded by 



2 



'XK X 



^xPx 

The Hilbert-Schmidt norm is finite: 

P r p)/2 H X = Px - p - 5)/2 {pTh> 

for any small 5 > is the product of a Hilbert-Schmidt operator and a bounded operator 
with norms independent of a; thus the integral is finite. QED 

Corollary 4 The usual proof of the Bogoliubov-Peierls inequality holds, to arrive at the 
inequality 

log Tr e~ H <> +x > log Tr e~ H ° + p ■ X 

Definition 5 Let us say that a map $, from a linear subspace X of the space of -ff -bounded 
quadratic forms, to R + U {oo} is a quantum Young function for X if 

1. &(X) is finite for all forms X with sufficiently small Kato bound 

2. X h-> <5(Y) is convex 

3. $(-Y) = *(Y) for all X G X 

4. $(0) = 0, and if X ^ 0, 3>(Y) > 0, including oo as a possible value. 
Theorem 6 For each p £ A4, the map $ of (|28|) is a quantum Young function. 
PROOF Lemma (4) of [IZ] gives the proof of (1). 

For (2), it is known that for self-adjoint A, the map A i— ► Tre" 4 is convex, so that 

Tre AA+(i-A)B < ATre A + (1 - A)Tre B 

Put ^4 = — i/o — -X" an d B = —Hq — Y, where X and Y are sufficiently i?o- sma h forms. 
Then AY + (1 — A)Yis also a sufficiently small form, and 



Tre ^o-AX-(i-A)Y =Tre AA+(l-x)B < ATr e~ H °~ x + (1 - A)Tre-^- 

Then being the sum of two convex functions, is convex. 
Items (3) and (4) are obvious. QED 



Y 



2.5 The Luxemburg norm 



We now specialize to the Young function of interest, associated with a point po G M. Thus, 
po = exp{— Hq — V>o}> an d e~P H ° is of trace class for some (3 < 1. Let Qo be the quadratic 
form 

Q (cj>) := ll^Vll 2 , (30) 

and let X be a Qo-bounded quadratic form. If ||-X"|| K > 1, then <&(X) is put equal to oo, 
since either Hq + X or Hq — X is not bounded below. It might be that even when ||X|| A . < 1, 
<J?(X) is still oo, because although H± := Hq ±X are both self-adjoint and bounded below, 
g— "± might not be not of trace class. Let us denote by ||X||, the infimum of the Qo-bounds 
of X such that e~ H± £ C x , or oo if X is Q -tiny. We showed in that ||X|| fc > 0. Then 
we can define a lower semi-continuous Young function on the one-dimensional set of forms 
{AX : A G R} by $(AX) for small enough A, and by 



Theorem 7 With <3? given by (|3T|). we have 
(i) 

ll*L. : = inf {r : $ (-) <a 

defines a norm on Span R ^f . 
(ii) All these norms, for various a > 0, are equivalent. 



PROOF 

(i) Obviously, || • \\ La > 0, and for A ^ 0, 



XX\\ La = inf <|r > : $ ( ) <a 



fXX^ 



V r J 

inf I \X\s > : $ ( — I < a 



Also, if X = 0, 



|X|| L = inf{r > : $(0) < a} = inf{r > 0} = 0. 



Conversely, if X is such that = 0, then there must exist a sequence r n — » such 

that 

'X\ 



n -j<a. (32) 

But by assumption, if J / 0, $(sA) > for some s > 0; convexity then shows that 
$>(sX) —* oo at least as fast as linear in s, contradicting (|H2|k this shows that X = 0. 
Finally, for the triangle inequality, put r = s + t, X = s/r, 1 — X = t/r. Then the set 



A := A(a) := |r : 



i / X + Y \ 



contains the set 

A = js + t : A$ + (1 - A)$ (y) < a J 

For suppose that r = s + £ € Aq. Then 

< A*(f) + (l-A)*(I 

< a, 

showing that r £ A, and so ^4o Q A. The set ^4o contains the set 
A)0 := Is + 1 : $ ( — ^ < a and <I> (^-) < a 



For suppose r <G ^oo- Then there exist s,t such that s + t = r and &(X/s) < a and 
$(Y/i) < a. Then there exists s + t such that 

A$ + (1 - A)3> ( y) < (A + (1 - A))a = a, 

so r G Ao. This shows that ^oo Q Aq C A. Since the infimum of a larger set of real numbers 
is not greater than the infimum of a smaller set, we have 

\\X + Y\\ La = miA<mfA 00 

= inf js + t : <E> ^— ^ < a and $ ^y^j < a 

inf js : $ < aj + inf [t : $ f-^l < <; 



- ll-X'Lo + II^ILa" 



This proves (i). 

(ii) We may assume that a > b; then 

so ||X|| L6 is the stronger norm. It remains to show that is also weaker. If X is 

Qo tiny, when the Kato seminorm \\X\\ K vanishes, then $(AX) is finite and continuous, 
increasing in A to infinity (by convexity). It therefore passes a and b at points A = a' and 
b', where 

a^jlXir 1 and b' = \\X\\-} 

respectively. From convexity, 

*(b'X) = $ (- f X + (\--\q)< -^(a'X). 



Thus 

\\X\ 



giving 

\\XL b <l\\X\\ La , (33) 

showing equivalence in this case. This set-up, in which the infimum of the sets A(a) and 
A(b) are both achieved in < r < ||-X"||r , could also arise if ||X|| K > 0, and leads to the 
same conclusion. 

Now suppose that ||X|| K = 1, and consider <3?(AX) as a function of A. It is possible that 
b is not reached by any $(AX) before A = ||AT|| fc , in which case ||Af|| £6 = . In that 

case a is also not reached by $>(\X) before it becomes infinite, and ||X|j La = ||X||~ too, 
and the norms are equal, and so equivalent. 

The only remaining possibility is that <&(b'X) = b for some b' < ||AT|| fe , giving = 
1/6', while a is not reached by $(AAT) before A = a" := ||AT|| fe , so that ||X|| ia = HATH^ 1 = 
1/a". Then by convexity, 

b = Hb'X) = * (^a"X + (l - ^) .0 

This can be rearranged to give l|33|) . which completes the proof of (ii). QED 
In view of (ii), we can take a = 1, and use the notation || • || L for the Luxemburg norm 
|| • \\ L1 . It is clear that ||AT|| L > ||AT||&: by our convention, $(X/r) is infinite for r < ||X||fc. 
This convention is inevitable; for, if both exp{— Hq ± X} are of trace class, there exists C 
such that Tr exp{— Hq ± X} < C. But the state is a positive operator, so its trace is its 
trace norm, which is larger that its operator norm. Hence 

< exp{-# ± X) < CI. 

Taking logs (an operator monotone operation, also valid for forms) gives 

±X < H + log C. 

Thus X must be Qo-bounded with bound < 1: no larger ||AT|| K can give finite 

It is likely that in our situation, <J>(X/r) goes smoothly to infinity as r — > 0, passing 
through all positive values, and diverging to infinity at r = ||X||r . If this were known to 
be true, then the proof of (ii) would be the same as the easy case when X is Qo-thry- 



2.6 Duality 

In [3], the authors associate with a Banach manifold M. a whole bundle of tangent spaces, 
coming from the various Amari embeddings, p \— > £ a (p) = p l ^ p ■ This elegant point of view 
actually contains the fact that there is only one tangent space and one cotangent space, 
each of which is furnished with a family of affine connections. 

We adopt a more concrete version, mainly because we do not yet know whether our 
space is complete, uniformly convex etc. as required by 0. Let p £ M. The set of states 

% '■= {p x '• X is .£f -e-bounded} 



can be furnished with (+l)-affine structure and with the Luxemburg norm. This space 
might not be complete. We parametrise the space by the scores, X. The topological dual 



X of the completed space of scores will contain density operators with finite entropy, and 
possibly unwanted non-normal states and weights. We take the subset X* C X d being the 
(— l)-linear span of density operators obeying l|24[) for some p, which, as remarked, carries 
the (— l)-affine connection. The pair X,X* is generated by the Amari embeddings of the 
set of states near p : 

p i ^ £+{p) '■= log/O and its dual p t— > £-(p) ■= p 

and their associated affine connections, (+1) and (—1). We may then write 

S(Po\Px) + S(p x \p ) =Tr [(\ogp x -logp )(/v -Po)}- 

We take the limit so that the differences define tangent vectors, to get the second Gateaux 
derivative of the l.h.s. This is known to be the BKM metric 

(X,Y) B =Tr(de+(p)d£-0>)). 

This shows that the duality between X and X*, given by the trace form, can be expressed 
in terms of the BKM metric. 

Given a Young function defined on X, we define the dual Young function <&* on the 
dual space X* by 

<F(/v) := sup {{X, Y) B - <S>{X)} , p Y - p G X*. (34) 



Theorem 8 <£* is a Young function, it is lower semi-continuous in the BKM metric, and 
Young's inequality 

*(X) + $*(p Y )>(X,Y) B (35) 

holds for all X,Y. 

PROOF Clearly, <£* is even and vanishes at Y = 0. For convexity, let p\ denote py 1 etc., so 
that pi — p is the cotangent vector dl_ (pi ) . Then 

$*(A/>i + (l-A)p 2 ) = sup{ATr (M_(/>i)) + (1 - A)Tr {Xdl^(p 2 )) - §{X)} 

x 

< X sup {(X, Y l ) B - <D(Y)} + (1 - A) sup {(X, Y 2 ) B - $(X)} 

X X 

= A$*(pi) + (1 - A)$*(p 2 ). 

It follows from &*(p Q ) = and convexity that &*{p Y ) > or is oo. 

, being the supremum of a family of continuous function (indeed, continuous linear 
functions) is lower semi-continuous. For Young's inequality, <&*(p Y ) being the supremum of 
(X, Y) B — <&(X), cannot be smaller than any example. QED 
The double dual obeys $**<<!>; for, 

$**(X) = sup{(X,Y) B -<P*(p Y )} 

Y 



= S up\(X,Y) B -sup{(X',Y) B -<S>(X')}) 

Y I X' ) 

= supmf{(X-X',Y) B +^X')} 

Y X' 

< sup{(X-X',Y) B + $(X')} 

Y 



for all X' . Choosing X' = X gives the inequality. 

It follows that (<£*)** < <I>*. But we also have the inequality the other way round: 

($*)** (p Y ) = (*")* fa,) = sup{<X,Y> B 

x 

> S up{(X,Y) B -$(X)} 
x 

= $*(py), 

so ($*)** = $*. This duality occurs because is lower semi-continuous. Indeed, $>** is 
the lower semi-continuous version of $ Prom now on, we shall assume that $ is lower 
semi-continuous, so that <&** = 

We now consider the quantum analogue of the inequality (fTC|) : the classical Young 
function A — > <I>(AY) is continuous and increasing where finite. It follows that the infimum 
in theorem (JJJ) is achieved at r = HYlj" 1 . Similarly for the dual Luxemburg norm. Now 
let \\X\\ L = 1 and \\p Y - p \\ Lt = 1. Then <5(Y) = 1 and ®*(p Y ) = 1, and by Young's 
inequality 

2||X|| £ \\p x - p \\ L , = 2 = *(X) + $*{p Y ) > Tr [X (p Y - p )] 
which multiplies up to give for tangent and cotangent vectors 

(X,Y) B <2\\X\\ L \\p Y -p \\ L ,. (36) 

3 Conclusion 

We have argued that the information manifold in quantum theory should consist of density 
operators p, some fractional power of which is still of trace class. The topology on the 
manifold should not be given by the trace norm. Instead, a 'hood of a given state p should 
be given by e-bounded quadratic forms; these were shown [H] to make up a possible analogue 
of the Cramer class of random variables, in that their Kubo-Mori expansion is analytic. This 
set of states carries the (+1) affine structure of Amari. A possible Young function, related 
to the free energy, was introduced. The dual Young function was shown to be finite on a set, 
the union of all p-nearby states, and this carries the (— l)-afiine structure of Amari. The 
beginnings of Young theory (the 5i^M-metric, the Luxemburg norms, Young's inequality 
and the Holder-Orlicz inequality) were derived. 

Let us now complete X in the Luxemburg norm, and X* in the dual Luxemburg norm. 
The quantum Holder-Orlicz inequality (|36f) then shows that the bilinear form between 
the spaces remains finite; we can therefore extend the definition of the .ROf-metric to 
the completions. The two Banach spaces thus obtained contain only normal states. The 
tangent and cotangent spaces are then complete and dual relative to the Hilbert-Schmidt 
scalar product, and are furnished with the (±l)-affine structures. The tangent space then 
contains the set of operators with finite Araki norm, and the cotangent space contains the 
states which are perturbations of p by such operators. 

The Luxemburg norm becomes large when we add a perturbation such that one of 
e ~H ±x neai \y cea ses to be of trace class. In this way, the manifold consists of points that 
are in the interior of some one-parameter exponential model. All states in the manifold 
have finite entropy, and states near p have finite relative entropy to p . 

One important property of the theory remains unproved: the equivalence of the Lux- 
emburg norms based on points p and p x for perturbations Y lying in the overlaps of any 



'hoods of p with any 'hood of p x . It would also be nice for the dual affine structures to 
be defined on the same space. In the classical case, this was resolved by Grasselli T in the 
subtheory obtained by completing the space of bounded perturbations in the Luxemburg 
norm, to obtain the (separable) Banach space M. Then the information manifold becomes 
a Banach manifold modelled on M. In the quantum case, the analogue of this space seems 
to be the completion in the Luxemburg norm of the linear space consisting of perturbations 
of finite Araki norm. One can ask whether this completion consists of only tiny forms. 
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